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—— Abstract 


Gossip protocols (also called rumor spreading or epidemic protocols) are widely used to disseminate 
information in massive peer-to-peer networks. These protocols are often claimed to guarantee privacy 
because of the uncertainty they introduce on the node that started the dissemination. But is that 
claim really true? Can the source of a gossip safely hide in the crowd? This paper examines, for the 
first time, gossip protocols through a rigorous mathematical framework based on differential privacy 
to determine the extent to which the source of a gossip can be traceable. Considering the case of a 
complete graph in which a subset of the nodes are curious, we study a family of gossip protocols 
parameterized by a “muting” parameter s: nodes stop emitting after each communication with a 
fixed probability 1 — s. We first prove that the standard push protocol, corresponding to the case 
s = 1, does not satisfy differential privacy for large graphs. In contrast, the protocol with s = 0 
(nodes forward only once) achieves optimal privacy guarantees but at the cost of a drastic increase 
in the spreading time compared to standard push, revealing an interesting tension between privacy 
and spreading time. Yet, surprisingly, we show that some choices of the muting parameter s lead to 
protocols that achieve an optimal order of magnitude in both privacy and speed. Privacy guarantees 
are obtained by showing that only a small fraction of the possible observations by curious nodes 
have different probabilities when two different nodes start the gossip, since the source node rapidly 
stops emitting when s is small. The speed is established by analyzing the mean dynamics of the 
protocol, and leveraging concentration inequalities to bound the deviations from this mean behavior. 
We also confirm empirically that, with appropriate choices of s, we indeed obtain protocols that are 
very robust against concrete source location attacks (such as maximum a posteriori estimates) while 
spreading the information almost as fast as the standard (and non-private) push protocol. 
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[I Introduction 


Gossip protocols (also called rumor spreading or epidemic protocols), in which participants 
randomly choose a neighbor to communicate with, are both simple and efficient means 
to exchange information in P2P networks [I] [2] B| [4]. They are a basic building block to 
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propagate and aggregate information in distributed databases [5][6] and social networks [7] [8], 
to model the spread of infectious diseases [9], as well as to train machine learning models on 


distributed datasets [T0] [1] [T2] [13]. 


Some of the information gossiped may be sensitive, and participants sharing it may not 
want to be identified. This can for instance be the case of whistle-blowers or individuals that 
would like to exercise their right to freedom of expression in totalitarian regimes. Conversely, 
it may sometimes be important to locate the source of a (computer or biological) virus, or 
fake news, in order to prevent it from spreading before too many participants get “infected”. 


There is a folklore belief that gossip protocols inherently guarantee some form of source 
anonymity because participants cannot know who issued the information in the first place 
[4]. Similarly, identifying “patient zero” for real-world epidemics is known to be a very hard 
task. Intuitively indeed, random and local exchanges make identification harder. But to what 
extent? Although some work has been devoted to the design of source location strategies 
in specific settings [15] (16) [7], the general anonymity claim has never been studied from 
a pure privacy perspective, that is, independently of the very choice of a source location 
technique. Depending on the use-case, it may be desirable to have strong privacy guarantees 
(e.g., in anonymous information dissemination) or, on the contrary, we may hope for weak 
guarantees, e.g., when trying to identify the source of an epidemic. In both cases, it is crucial 
to precisely quantify the anonymity level of gossip protocols and study its theoretical limits 
through a principled approach. This is the challenge we take up in this paper for the classic 
case of gossip dissemination in a complete network graph. 


Our first contribution is an information-theoretic model of anonymity in gossip protocols 
based on (e,6)-differential privacy (DP) [I8] [9]. Originally introduced in the database 
community, DP is a precise mathematical framework recognized as the gold standard for 
studying the privacy guarantees of information release protocols. In our proposed model, 
the information to protect is the source of the gossip, and an adversary tries to locate the 
source by monitoring the communications (and their relative order) received by a subset of f 
curious nodes. In a computer network, these curious nodes may have been compromised by a 
surveillance agency; in our biological example, they could correspond to health professionals 
who are able to identify whether a given person is infected. Our notion of DP then requires 
that the probability of any possible observation of the curious nodes is almost the same no 
matter who is the source, thereby limiting the predictive power of the adversary regardless of 
its actual source location strategy. A distinctive aspect of our model is that the mechanism 
that seeks to ensure DP comes only from the natural randomness and partial observability of 
gossip protocols, not from additional perturbation or noise which affects the desired output as 
generally needed to guarantee DP [20]. We believe our adaptation of DP to the gossip context 
to be of independent interest. We also complement it with a notion of prediction uncertainty 
which guarantees that even unlikely events do not fully reveal the identity of the source 
under a uniform prior on the source. This property directly upper bounds the probability of 
success of any source location attack, including the maximum likelihood estimate. 


We use our proposed model to study the privacy guarantees of a generic family of gossip 
protocols parameterized by a muting parameter s: nodes have a fixed probability 1 — s 
to stop emitting after each communication (until they receive the rumor again). In our 
biological parallel, this corresponds to the fact that a person stops infecting other people 
after some time. The muting parameter captures the ability of the protocol to forget initial 
conditions, thereby helping to conceal the identity of the source. In the extreme case where 
s = 1, we recover the standard “push” gossip protocol [2], and show that it is inherently not 
differentially private for large graphs. In contrast, we also show that, at the other end of the 
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Muting param. | 6 ensuring (0,6)-DP | Spreading time 
(minimal nes speed) ni. 1 O(log n) 
a es Sm £ O (nlogn) 
a E st(1—s)£ | O(log(n)/s) 


B Table 1 Summary of results to illustrate the tension between privacy and speed. n is the total 
number of nodes and f/n is the fraction of curious nodes in the graph. ô € [0, 1] quantifies differential 
privacy guarantees (smaller is better). Spreading time is asymptotic in n. 


spectrum, choosing s = 0 leads to optimal privacy guarantees among all gossip protocols. 

More generally, we determine matching upper and lower bounds on the privacy guarantees 
of gossip protocols. Essentially, our upper bounds on privacy are obtained by tightly lower 
bounding the probability that the source node contacts a curious node before another node 
does, and upper bounding the probability that this happens for a random node fixed in 
advance, in a way that holds for all gossip algorithms. Remarkably, despite the fact that the 
source node always has a non-negligible probability of telling the rumor to a curious node 
first, our results highlight the fact that setting s = 0 leads to strong privacy guarantees in 
several regimes, including the pure (€,0)-DP as well as prediction uncertainty. 

It turns out that, although achieving optimal privacy guarantees, choosing s = 0 leads 
to a very slow spreading time (log-linear in the number of nodes n). This highlights an 
interesting tension between privacy and spreading time: the two extreme values for the 
muting parameter s recover the two extreme points of this trade-off. We then show that more 
balanced trade-offs can be achieved: appropriate choices of the muting parameter lead to 
gossip protocols that are near-optimally private with a spreading time that is logarithmic in 
the size of the graph. In particular, the trade-off between privacy and speed shows up in the 
constants but, surprisingly, some choices of the parameter lead to protocols that achieve an 
optimal order of magnitude for both aspects. Our results on this trade-off are summarized in 
Table[1] for a proportion f/n of curious nodes, one can see that setting the muting parameter 
s = f/n achieves almost optimal privacy (up to a factor 2) while being substantially faster 
than s = 0 (optimal up to a factor f/n). Similarly, if one wants to achieve (0, 6o)-differential 
privacy with ðo > 2f/n, then it is possible to set s = 69/2 and obtain a protocol that respects 
the privacy constraint with spreading time O(log(n)/d9). From a technical perspective, 
these privacy results are obtained by showing that only a small fraction of the possible 
observations by curious nodes have different probabilities when two different nodes start 
with the gossip. This requires to precisely evaluate the probability of well-chosen worst-case 
sequences, which is generally hard as randomness is involved both when nodes decide to stop 
spreading the rumor (with probability 1 — s) and when they choose who to communicate 
with. Our parameterized gossip protocol can be seen as a population protocol [21], and we 
prove its speed by analyzing its mean dynamics and leveraging concentration inequalities to 
bound the deviations from the mean dynamics. 

We support our theoretical findings by an empirical study of our parameterized gossip 
protocols. The results show that appropriate choices of s lead to protocols that are very 
robust against classical source location attacks (such as maximum a posteriori estimates) 
while spreading the information almost as fast as the standard (and non-private) push 
protocol. Crucially, we observe that our differential privacy guarantees are very well aligned 
with the ability to withstand attacks that leverage background information, e.g., targeting 
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known activists or people who have been to certain places. 

The rest of the paper is organized as follows. We first discuss related work and formally 
introduce our concept of differential privacy for gossip. Then, we study two extreme cases 
of our parameterized gossip protocol: the standard push protocol, which we show is not 
private, and a privacy-optimal but slow protocol. This leads us to investigate how to better 
control the trade-off between speed and privacy. Finally, we present our empirical study and 
conclude by discussing open questions. 

For pedagogical reasons, we keep our model relatively simple to avoid unnecessary 
technicalities in the derivation and presentation of our results. For completeness, we discuss 
the impact of possible extensions (e.g., information observed by the adversary, malicious 
behavior, termination criterion) in Appendix [A] For space limitations, some detailed proofs 
are also deferred to the appendix. 


“2 Background and Related Work 


2.1 Gossiping 


The idea of disseminating information in a distributed system by having each node push 
messages to a randomly chosen neighbor, initially coined the random phone-call model, dates 
back to even before the democratization of the Internet [2]. Such protocols, later called 
gossip, epidemic or rumor spreading, were for instance applied to ensure the consistency of 
a replicated database system [5]. They have gained even more importance when argued to 
model spreading of infectious diseases [9] and information dissemination in social networks 
[A [8]. Gossip protocols can also be used to compute aggregate queries on a database 
distributed across the nodes of a network [22] [6], and have recently become popular in 
federated machine learning [23] to optimize cost functions over data distributed across a large 
set of peers [10] [1 [12] 13]. Gossip protocols differ according to their interaction schemes, 
i.e., pull or push, sometimes combining both [3] [24] 25]. 

In this work, we focus on the classical push form in the standard case of a complete graph 
with n nodes (labeled from 0 to n — 1). We now define its key communication primitive. 
Denoting by J the set of informed nodes, tell_gossip(#, J) allows an informed node i € I 
to tell the information to another node j € {0,...,2 — 1} chosen uniformly at random. 
tell_gossip(i, I) returns j (the node that received the message) and the updated I (the new 
set of informed nodes that includes j). Equipped with this primitive, we can now formally 
define the class of gossip protocols that we consider in this paper. 


> Definition 1 (Gossip protocols). A gossip protocol on a complete graph is one that (a) 
terminates almost surely, (b) ensures that at the end of the execution the set of informed 
nodes I is equal to {0,...,n — 1}, and (c) can modify I only through calls to tell_gossip. 


2.2 Locating the Source of the Gossip 


Determining the source of a gossip is an active research topic, especially given the potential 
applications to epidemics and social networks, see for a recent survey. Existing approaches 
have focused so far on building source location attacks that compute or approximate the 
maximum likelihood estimate of the source given some observed information. Each approach 
typically assumes a specific kind of graphs (e.g., trees, small world, etc.), dissemination 
model and observed information. In rumor centrality [I7], the gossip communication graph 
is assumed to be fully observed and the goal is to determine the center of this graph to 
deduce the node that started the gossip. Another line of work studies the setting in which 
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some nodes are curious sensors that inform a central entity when they receive a message 
[16]. Gossiping is assumed to happen at random times and the source node is estimated by 
comparing the different timings at which information reaches the sensors. The proposed 
attack is natural in trees but does not generalize to highly connected graphs. The work 
of [26] focuses on hiding the source instead of locating it. The observed information is a 
snapshot of who has the rumor at a given time. A specific dissemination protocol is proposed 
to hide the source but the privacy guarantees only hold for tree graphs. 

We emphasize that the privacy guarantees (i.e., the probability not to be detected) that 
can be derived from the above work only hold under the specific attacks considered therein. 
Furthermore, all approaches rely on maximum likelihood and hence assume a uniform prior 
on the probability of each node to be the source. The guarantees thus break if the adversary 
knows that some of the nodes could not have started the rumor, or if he is aware that the 
protocol is run twice from the same source. 

We note that other problems at the intersection of gossip protocols and privacy have 
been investigated in previous work, such as preventing unintended recipients from learning 
the rumor [27], and hiding the initial position of agents in a distributed system [28]. 


2.3 Differential Privacy 


While we borrow ideas from the approaches mentioned above (e.g., we assume that a subset 
of nodes are curious sensors as in [16]), we aim at studying the fundamental limits of 
any source location attack by measuring the amount of information leaked by a gossip 
scheme about the identity of the source. For this purpose, a general and robust notion of 
privacy is required. Differential privacy [18] 20] has emerged as a gold standard for it holds 
independently of any assumption on the model, the computational power, or the background 
knowledge that the adversary may have. Differentially private protocols have been proposed 
for numerous problems in the fields of databases, data mining and machine learning: examples 
include computing aggregate and linear counting queries [20], releasing and estimating graph 
properties [29] [30], clustering [31], empirical risk minimization [32] and deep learning [33]. 
In this work, we consider the classic version of differential privacy which involves two 
parameters €, ô > 0 that quantify the privacy guarantee [19]. More precisely, given any two 
databases D; and D, that differ in at most one record, a randomized information release 
protocol P is said to guarantee (e, 6)-differential privacy if for any possible output S: 


p(P(D1) € S) < ep(P(Do) € S) + 6, (1) 


where p(E) denotes the probability of event E. Parameter € places a bound on the ratio of 
the probability of any output when changing one record of the database, while parameter 6 
is assumed to be small and allows the bound to be violated with small probability. When 
€ = 0, ô gives a bound on the total variation distance between the output distributions, while 
ô = 0 is sometimes called “pure” e-differential privacy. DP guarantees hold regardless of the 
adversary and its background knowledge about the records in the database. In our context, 
the background information could be the knowledge that the source is among a subset of all 
nodes. Robustness against such background knowledge is crucial in some applications, for 
instance when sharing secret information that few people could possibly know or when the 
source of an epidemic is known to be among people who visited a certain place. Another 
key feature of differential privacy is composability: if (€,6)-differential privacy holds for a 
release protocol, then querying this protocol two times about the same dataset satisfies 
(2e, 26)-differential privacy. This is important for rumor spreading as it enables to quantify 
privacy when the source propagates multiple rumors that the adversary can link to the same 
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source (e.g., due to the content of the message). We will see in Section [6|that these properties 
are essential in practice to achieve robustness to concrete source location attacks. 

Existing differentially private protocols typically introduce additional perturbation (also 
called noise) to hide critical information [20]. In contrast, an original aspect of our work 
is that we will solely rely on the natural randomness and limited observability brought by 
gossip protocols to guarantee differential privacy. 


‘3. A Model of Differential Privacy for Gossip Protocols 


Our first contribution is a precise mathematical framework for studying the fundamental 
privacy guarantees of gossip protocols. We formally define the inputs of the gossip protocols 
introduced in Definition [I] the outputs observed by the adversary during their execution, 
and the privacy notions we investigate. To ease the exposition, we adopt the terminology of 
information dissemination, but we sometimes illustrate the ideas in the context of epidemics. 


3.1 Inputs and Outputs 


As described in Section [2.3] differential privacy is a probabilistic notion that measures the 
privacy guarantees of a protocol based on the variations of its output distribution for a change 
in its input. In this paper, we adapt it to our gossip context. We first formalize the inputs 
and outputs of gossip protocols, in the case of a single piece of information to disseminate 
(multiple pieces can be addressed through composition, see Section [2.3). At the beginning 
of the protocol, a single node, the source, has the information (the gossip, or rumor). This 
node defines the input of the gossip protocol, and it is the actual “database” that we want to 
protect. Therefore, in our context, input databases in Equation have only 1 record, which 
contains the identity of the source (an integer between 0 and n — 1). Therefore, all possible 
input databases differ in at most one record, and differential privacy aims at protecting the 
content of the database, i.e., which node started the rumor. 

We now turn to the outputs of a gossip protocol. The execution of a protocol generates an 
ordered sequence Somni of pairs (i, j) of calls to tell_gossip where (Somni)+ corresponds to 
the t-th time the tell_gossip primitive has been called, À is the node on which tell_gossip 
was used and j the node that was told the information. If several calls to tell_gossip 
happen simultaneously, ties are broken arbitrarily. We assume that the messages are received 
in the same order that they are sent. This protocol can thus be seen as an epidemic 
population protocol model [21] in which nodes interact using tell_gossip. The sequence 
Somni Corresponds to the output that would be observed by an omniscient entity who could 
eavesdrop on all communications. It is easy to see that, for any execution, the source can be 
identified exactly from Somni simply by retrieving (Somni)o- 

In this work, we focus on adversaries that monitor a set of curious nodes C of size f, i.e. 
they observe all communications involving a curious node. This model, previously introduced 
in [16], is particularly meaningful in large distributed networks: while it is unlikely that 
an adversary can observe the full state of the network at any given time, compromising or 
impersonating a subset of the nodes appears more realistic. The number of curious nodes 
is directly linked with the release mechanism of DP: while the final state of the system is 
always the same (everyone knows the rumor), the information released depends on which 
messages were received by the curious nodes during the execution. Formally, the output 
disclosed to the adversary during the execution of the protocol, i.e., the information he can 
use to try to identify the source, is a subsequence of Somni as defined below. 
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> Assumption 2. The sequence S observed by the adversary through the (random) execution 
of the protocol is a (random) subsequence S = ((Somni)t|(Somni)t = (i, j) with j E€ C), that 
contains all messages sent to curious nodes. The adversary has access to the relative order 
of tuples in S, which is the same as in Somni, but not to the index t in Somni- 


It is important to note that the adversary does not know which messages were exchanged 
between non-curious nodes. In particular, he does not know how many messages were sent in 
total at a given time. As we focus on complete graphs, knowing which curious node received 
the rumor gives no information on the source node. For a given output sequence S, we write 
Sı = i to denote that the t-th tell_gossip call in S originates from node i. Omitting the 
dependence on S, we also denote t;(j) the time at which node j first receives the message 
(even for the source) and tq(j) the time at which j first communicates with a curious node. 

The ratio f/n of curious nodes determines the probability of the adversary to gather 
information (the more curious nodes, the more information leaks). For a fixed f, the adversary 
only becomes weaker as the network grows bigger. Since we would like to study adversaries 
with fixed power, unless otherwise noted we make the following assumption. 


> Assumption 3. The ratio of curious nodes f /n is constant. 


Finally, we emphasize that we do not restrict the computational power of the adversary. 


3.2 Privacy Definitions 


We now formally introduce our privacy definitions. The first one is a direct application of 
differential privacy (Equation [1) for the inputs and outputs specified in the previous section. 
To ease notations, we denote by Jp the source of the gossip (the set of informed nodes at time 
0), and for any given i € {0,...,n — 1}, we denote by p;(E) = p(E|Io = {i}) the probability 
of event E if node 7 is the source of the gossip. The protocol is therefore abstracted in this 
notation. Denoting by S the set of all possible outputs, we say that a gossip protocol is 
(e, 6)-differentially private if: 


pS) < efpi(S) +8, Ys CS, Vig € {0,.. n — 1}, (2) 


where p(S) is the probability that the output belongs to the set S. This formalizes a notion 
of source indistinguishability in the sense that any set of output which is likely enough to 
happen if node i starts the gossip (say, p;(S) > 6) is almost as likely (up to a e€ multiplicative 
factor) to be observed by the adversary regardless of the source. Note however that when 
ô > 0, this definition can be satisfied for protocols that release the identity of the source 
(this can happen with probability 6). To capture the behavior under unlikely events, we also 
consider the complementary notion of c-prediction uncertainty for c > 0, which is satisfied if 
for a uniform prior p(Jp) on source nodes and any à € {0,...,2 — 1}: 


plo # {#}]S)/pUo = {i}|S) Z c, (3) 


for any S C S such that p;(S) > 0. Prediction uncertainty guarantees that no observable 
output S (however unlikely) can identify a node as the source with large enough probability: 
it ensures that the probability of success of any source location attack is upper bounded 
by 1/(1 +c). This holds in particular for the maximum likelihood estimate. Prediction 
uncertainty does not have the robustness of differential privacy against background knowledge, 
as it assumes a uniform prior on the source. While it can be shown that (e,0)-DP with e > 0 
implies prediction uncertainty, the converse is not true. Indeed, prediction uncertainty is 
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satisfied as soon as no output identifies any node with enough probability, without necessarily 
making all pairs of nodes indistinguishable as required by DP. We will see that prediction 
uncertainty allows to rule out some naive protocols that have nonzero probability of generating 
sequences which reveal the source with certainty. 

Thanks to the symmetry of our problem, we consider without loss of generality that node 
0 starts the rumor (Jp = {0}) and therefore we will only need to verify Equations (2) and 
for à = 0 and j = 1. 


‘4 Extreme Privacy Cases 


In this section, we study the fundamental limits of gossip in terms of privacy. To do so, we 
parameterize gossip protocols by a muting parameter s € [0,1], as depicted in Algorithm [1] 
We thereby capture, within a generic framework, a large family of protocols that fit Definition[1] 
and work as follows. They maintain a set À of active nodes (initialized to the source node) 
which spread the rumor asynchronously and in parallel: this is modeled by the fact that at 
each step of the protocol, a randomly selected node i € A invokes the tell_gossip primitive 
to send the rumor to another node (which in turn becomes active), while à also stays active 
with probability s. This is illustrated in Figure [1] The muting parameter s can be viewed 
as a randomized version of fanout in Baf] Algorithm |1| follows the population protocol 
model [21], and is also related to the SIS epidemic model [9] but in which the rumor never 
dies regardless of the value of s € [0,1] (there always remain some active nodes). Although 
we present it from a centralized perspective, we emphasize that Algorithm [i] is asynchronous 
and can be implemented by having active nodes wake up following a Poisson process. 

In the rest of this section, we show that extreme privacy guarantees are obtained for 
extreme values of the muting parameter s. 


4.1 Standard Push has Minimal Privacy 


The natural case to study first in our framework is when the muting parameter is set to s = 1: 
this corresponds to the standard push protocol [2] in which nodes always keep emitting after 
they receive the rumor. T heorem [4] shows that, surprisingly, the privacy guarantees of this 
protocol become arbitrarily bad as the size of the graph increases (keeping the fraction of 
curious nodes constant). 


> Theorem 4 (Standard push is not differentially private). If Algorithm{1] with s = 1 guarantees 
(e,8)-DP for all values of n and constant e < co, then Ô = 1. 


This result may seem counter-intuitive at first since one could expect that it would be 
more and more difficult to locate the source when the size of the graph increases. Yet, since 
the ratio of curious nodes is kept constant, this result comes from the fact that the event 
{ta(0) < t;(1)} (node 0 communicates with a curious node before node 1 gets the message) 
becomes more and more likely as n grows, hence preventing any meaningful differential 
privacy guarantee when n is large enough. The proof is in Appendix [C.1] T heorem [J] clearly 
highlights the fact that the standard gossip protocol (s = 1) is not differentially private in 
general. We now turn to the other extreme case, where the muting parameter s = 0. 


1 Unlike in the classic fanout, nodes start to gossip again each time they receive a message instead of 
deactivating permanently. 
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© Algorithm 1 Parameterized Gossip 


Require: n {Number of nodes}, k {Source 
node}, s {Probability for a node to remain Pie 





active} 

Ensure: I = {0,...,n — 1} {All nodes are step 1 step 2 step 1 step 2 
informed} yu ji 
I+ {k}, At {k} 
while |I| < n do 

Sample 7 uniformly at random from A ae Pe: bad 

A+ A\ {i} with probability 1 — s | ns f 

j, I + tell_gossip(i, I), A AU {j} 
end while 





B Figure 1 Left: Parameterized Gossip. Right: Illustration of the role of muting parameter s. S 
indicates the source and C a curious node. Green nodes know the rumor, and red circled nodes are 
active. When s = 0, there is only one active node at a time, which always stops emitting after telling 
the gossip. In the case s = 1, nodes always remain active once they know the rumor (this is the 
standard push gossip protocol [2]). When 0 < s < 1, each node remains active with probability s 
after each communication. 


4.2 Muting After Infecting has Maximal Privacy 


We now study the privacy guarantees of generic Algorithm [I] when s = 0. In this protocol, 
nodes forward the rumor to exactly one random neighbor when they receive it and then 
stop emitting until they receive the rumor again. Intuitively, this is good for privacy: the 
source changes and it is quickly impossible to recover which node started the gossip (as 
initial conditions are quickly forgotten). In fact, once the source tells the rumor once, the 
state of the system (the set of active nodes, which in this case is only one node) is completely 
independent from the source. A similar idea was used in the protocol introduced in [26]. 

The following result precisely quantifies the privacy guarantees of Algorithm |1| with 
parameter s = 0 and shows that it is optimally private among all gossip protocols (in the 
precise sense of Definition [1). 


> Theorem 5. Lete > 0. For muting parameter s = 0, Algorithm[1] satisfies (€, d)-differential 


privacy with 6 = Î (1 — e) and c-prediction uncertainty with c = — 1. Furthermore, 


n 
FEN 
these privacy guarantees are optimal among all gossip protocols. 
Proof of Theorem 5] We start by proving the fundamental limits on the privacy of any 
gossip protocol, and then prove matching guarantees for Algorithm [1] with 8 = 0. 

(Fundamental limits in privacy) Proving a lower bound on the differential privacy 
parameters can be achieved by finding a set of possible outputs S (here, a set of ordered 
sequences) such that po(S) > pi(S). Indeed, a direct application of the definition of 
Equation yields that given any gossip protocol, S C S and wo, wi € R such that wo < 
po(S) and pi(S) < w, if the protocol satisfies (€, ô) differential privacy then ô > wo — ew. 
The proofs need to consider all the messages sent and then distinguish between the ones that 
are disclosed (sent to curious nodes) and the ones that are not. 

Since J = {0} then tell_gossip is called for the first time by node 0 and it is called 
at least once otherwise the protocol terminates with 1 = {0}, violating the conditions of 
Definition [1] We denote by S$) the set of output sequences such that So = 0 (i.e., 0 is the 


10 


Quantifying the Natural Differential Privacy Guarantees of Gossip Protocols 


first to communicate with a curious node). We also define the event T$ = {ta(0) 0} (the 
source does not send its first message to a curious node). For all i € C U {0}, we have that 
po(So = TS) < po(So = OTS) since po( A1 = {0}) = poli € Ai), where A, is the set of 
active nodes at time 1. From this inequality we get 


where the equality comes from the fact that So = à for some i ¢ C. The second inequality 
comes from the fact miai p;(So = iTS) patso k|T§) for all i,k A j. Therefore, we 
have po(So = O|TS) > — a and po(So = 1116) < ==. Combining the above expressions, we 
derive the probability of S) when 0 started the a We write po( S) = po(S, ta(0) = 
0) + po(S, TẸ) and then, since po(S|ta(0) = 0) = 











1 
po(S) = po (ta(0) = 0)po(Sta(0) = 0) + po(SIT§)po(75) > 4 + == (1 - À) 
In the end, po(S(0)) > E + À. If node 1 initially has the message, we do the same split and 
obtain pi(s® |ta(0) = 0) = 0 and so pı (9) = pı (TẸ)p (SOTE) < 2. 
The upper bound on prediction uncertainty is derived using the same quantities: 


pUo # ojs) _ pis 
po = 0|S) D 


pm(S) n-f-l1_ on 
po(S) f+1  f+l 








en 

<(n 

(0)) — 

igant POS”) 
Note that we have never assumed that curious nodes knew how many messages were sent 
at a given point in time. We have only bounded the probability that the source is the first 
node that sends a message to curious nodes. 

(Matching guarantees for Algorithm |1| with s = 0) For this protocol, the only 
outputs S such that po(S) #4 p1(5) are those in which ta(0) = 0 or ta(1) = 0. We write: 


po(So = 0) = po(ta(0) = 0)po(So = O|ta(0) = 0) + po(TS )po(So = O|TS). 





For any i ¢ C where C is the set of curious nodes, we have that po(So = 0|T5) = po(So 
ilTS) = aF Indeed, given that ta(0) 4 0, the node that receives the first message is selected 
uniformly at random among non-curious nodes, and has the same probability to disclose the 
gossip at future rounds. Plugging into the previous equation, we obtain: 


=l4fi fy 1 ftl 


n/n—f n 








For any other node i ¢ C U {0}, po(So = i) = po(T§)po(So = i|T$) = À because po(So = 
ilta(0) = 0) = 0. Combining these results we get po(S) < e%p,(S) +6 for any € > 0 and 
j= ta — E), By symmetry, we make a similar derivation for SU). 

To prove the prediction uncertainty result, we use the differential privacy result with 
ef = f +1 (and thus 6 = 0) and write that for any S € S: 








po  0|S) SR pi(S) = n 
p(o = OS) geome pols) = 


Theorem [5] establishes matching upper and lower bounds on the privacy guarantees of 
gossip protocols. More specifically, it shows that setting the muting parameter to s = 0 
provides strong privacy guarantees that are in fact optimal. Note that in the regime where 
€ = 0 (where DP corresponds to the total variation distance), ô cannot be smaller than the 
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proportion of curious nodes. This is rather intuitive since the source node has probability 
at least f/n to send its first message to a curious node. However, one can also achieve 
differential privacy with ô much smaller than f/n by trading-off with € > 0. In particular, 
the pure version of differential privacy (6 = 0) is attained for € ~ log f, which provides good 
privacy guarantees when the number of curious nodes is not too large. Furthermore, even 
though the probability of disclosing some information is of order f/n, prediction uncertainty 
guarantee shows that an adversary with uniform prior always has a high probability of 
making a mistake when predicting the source. Crucially, these privacy guarantees are made 
possible by the natural randomness and partial observability of gossip protocols. 


> Remark 6 (Special behavior of the source). A subtle but key property of Algorithm [1] is 
that the source follows the same behavior as other nodes. To illustrate how violating this 
property may give away the source, consider this natural protocol: the source node transmits 
the rumor to one random node and stops emitting, then standard push (Algorithm [1] with 
s = 1) starts from the node that received the information. While this delayed start gossip 
protocol achieves optimal differential privacy in some regimes, it is fundamentally flawed. In 
particular, it does not guarantee prediction uncertainty in the sense that c — 0 as the graph 
grows. Indeed, the adversary can identify the source with high probability by detecting 
that it communicated only once and then stopped emitting for many rounds. We refer to 
Appendix |B] for the formal proof. 


‘5 Privacy vs. Speed Trade-offs 


While choosing s = 0 achieves optimal privacy guarantees, an obvious drawback is that it 
leads to a very slow protocol since only one node can transmit the rumor at any given time. 
It is easy to see that the number of gossip operations needed to inform all nodes can be 
reduced to the time needed for the classical coupon collection problem: it takes O(n log n) 
communications to inform all nodes with probability at least 1 — 1/n [35]. As this protocol 
performs exactly one communication at any given time, it needs time O(n log n) to inform all 
nodes with high probability. This is in stark contrast to the standard push gossip protocol 
(s = 1) studied in Section [4.1] where all informed nodes can transmit the rumor in parallel, 
requiring only time O(log n) [I]. 

These observations motivate the exploration of the privacy-speed trade-off (with parameter 
0< s< 1). We first show below that nearly optimal privacy can be achieved for small values 
of s. Then, we study the spreading time and show that the O(log n) time of the standard 
gossip protocol also holds for s > 0, leading to a sweet spot in the privacy-speed trade-off. 


5.1 Privacy Guarantees 


Theorem [7| conveys a (0, 0)-differential privacy result, which means that apart from some 
unlikely outputs that may disclose the identity of the source node, most of these outputs 
actually have the same probability regardless of which node triggered the dissemination. We 
emphasize that the guarantee we obtain holds for any graph size with fixed proportion f/n 
of curious nodes. 


> Theorem 7 (Privacy guarantees for s < 1). For 0 < s < 1 and any fixed r € N*, Algorithm[1] 
with muting parameter s guarantees (0, 6)-differential privacy with: 


jiet (1-4) si-a- - fy" 


k=0 


11 





12 


Quantifying the Natural Differential Privacy Guarantees of Gossip Protocols 


For example, choosing r = 1 leads to 6 < s+(1— s)f, as reported in Table[1] Slightly tighter 


bounds can be obtained, but this is enough already to recover optimal guarantees as s — 0. 


Proof. We first consider that S is such that t4(0) > ta(1). Then, po(S) < pi(S) since node 
0 needs to receive the rumor before being able to communicate it to curious nodes, and 
Equation is verified. Suppose now that S is such that ta(0) < ta(1). In this case, we note 
tm the first time at which the source stops to emit (which happens with probability 1 — s 
each time it sends a message). Then, we denote F = {#4(0) < tm} (and F° its complement). 
In this case, po(S|F°) < pi(S|F°). Indeed, conditioned on F°, ta(0) > t;(0) if node 0 is not 
the source and t4(0) > max(tm,t;(0)) if it is. Then, we can write: 


po(S) = po(S, F°) + po(S, F) < p1(8, F°) + po(F) < pi(S) + po(F). 


Denoting Ty the number of messages after which the source stops emitting, we write: 


O0 


po( SA Ty = k)po(FITy = k) = È (1 -8)s ‘(1 = (= JT), for s > 0. 


n 
k=1 k=0 


Note that we can also write for k > 1 that po(F) = po(F, Ty < k) + po(F, Tr > k), and so: 








po(F) < (1 (1 (1 DELET (1 s*)(1 T < 

The differential privacy guarantees given by Theorem [7]Jand the optimal guarantees of 
Theorem [| are of the same order of magnitude when s is of order f/n. Indeed, consider 
c = 0. Then, setting r = 1 in Theorem [7] leads to an additive gap of s(1 — f/n) between 
the privacy of Algorithm [I] and the optimal guarantee, showing that one can be as close as 
desired to the optimal privacy as long as s is chosen close enough to 0. In particular, the 
ratio between the privacy of Algorithm [1] and the lower bound is less than 2 for all s < f/n. 
This indicates that the privacy guarantees are very tight in this regime. We also recover 
exactly the optimal guarantee of Theorem [5] in the case s = 0 (without the ability to control 
the trade-off between € and 6). Importantly, we also show that Algorithm [1] with s<l 
satisfies prediction uncertainty, unlike the case where s = 1. 


> Theorem 8. Algorithm|1] guarantees prediction uncertainty with c = (1 — 1) —s). 


This result is another evidence that picking s < 1 allows to derive meaningful privacy 
guarantees. The proof can be found in Appendix [C1] 


5.2 Spreading time 


We have shown that parameter s has a significant impact on privacy, from optimal (s = 0) 
to very weak (s = 1) guarantees. Yet, s also impacts the spreading time: the larger s, the 
more active nodes at each round. This is highlighted by the two extreme cases, for which the 
spreading time is already known and exhibits a large gap: O(logn) for s = 1 and O(nlogn) 
for s = 0. To establish whether we can obtain a protocol that is both private and fast, we 
need to characterize the spreading time for the cases where 0 < s < 1. 

The key result of this section is to prove that the logarithmic speed of the standard push 
gossip protocol holds more generally for all s > 0. This result is derived from the fact that 
the ability to forget does not prevent an exponential growth phase. What changes is that 
the population of active nodes takes approximately 1/s rounds to double instead of 1 for 
standard gossip. For ease of presentation, we state below the result for the synchronous 
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B Figure 2 Effect of parameter s of Algorithm [1|on the spreading time for a network of n = 21° 
nodes. The curves represent median values and the shaded area represents the 10 and 90 percent 
confidence intervals over 100 runs. Each curve stops when all nodes are informed (and so the protocol 
terminates), except for s = 0 since the protocol is very slow in this case. 


version of Algorithm [1] in which the notion of round corresponds to iterating over the full 
set A. A similar result (with an appropriate notion of rounds) can be obtained for the 
asynchronous version given in Algorithm [I] 


> Theorem 9. For a given s > 0 and for all 1 > 6 > 0 and C > 1, there exists n large 
enough such that the synchronous version of Algorithm [1] with parameter s sends at least 
Cnlogn messages in 6C log(n)/s rounds with probability at least 1 — ô. 


Proof sketch. The key argument of the proof is that the gossip process very closely follows 
its mean dynamics. After a transition phase of a logarithmic number of rounds, a constant 
fraction of the nodes (depending on s) remains active despite the probability to stop emitting 
after each communication. This “determinism of gossip process" has been introduced in [36], 
but their analysis only deals with the case s = 1. Our proof takes into account the nontrivial 
impact of nodes deactivation in the exponential and linear growth phase. Besides, we need 
to introduce and analyze a last phase, showing that with high probability the population 
never drops below a critical threshold of active nodes. The full proof is in Appendix [C2] < 


Theorem [9] shows that generic gossip with s > 0 still achieves a logarithmic spreading 
time even though nodes can stop transmitting the message. The 1/s dependence is intuitive 
since 1/s rounds are needed in expectation to double the population of active nodes (without 
taking collisions into account). Therefore, the exponential growth phase which usually takes 
time O(log n) now takes time O(log(n)/s) for s < 1. To summarize, we have shown that one 
can achieve both fast spreading and near-optimal privacy, leading to the values presented in 
Table [of the introduction. 


l6 Empirical Evaluation 


In this section, we evaluate the practical impact of s on the spreading time as well as on the 
robustness to source location attacks run by adversaries with background knowledge. 
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B Figure 3 Effect of parameter s of Algorithm [1]on the precision of source location attacks for a 
network of n = 21° node with 10% of curious nodes. Precision is estimated over 15,000 random runs. 


6.1 Spreading Time 


To complement Theorem [9] which proves logarithmic spreading time (asymptotic in n), we 
run simulations on a network of size n = 218. The logarithmic spreading time for s > 0 is 
clearly visible in Figure where we see that the gossip spreads almost as fast for s = 0.5 
that it does for s = 1. We also observe that even when s is small, the gossip remains 
much faster than for s = 0. The results in Figure [2b] illustrate that the fraction of active 
nodes grows exponentially fast for all values of s > 0 and then reaches a plateau when the 
probability of creating a new active node is compensated by the probability of informing an 
already active node. Empirically, this happens when the fraction of active nodes is of order s. 

We note incidentally that gossip protocols are often praised for their robustness to lost 
messages [37] [38]. While the protocol with s = 0 does not tolerate a single lost message, 
setting s > 0 improve the resilience thanks to the linear proportion of active nodes. The 
latter property makes it unlikely that the protocol stops because of lost messages as long as 
s is larger than the probability of losing messages. Of course, the protocol remains somewhat 
sensitive to messages lost during the first few steps. 


6.2 Robustness Against Source Location Attacks 


Getting an intuitive understanding of the privacy guarantees provided by Theorem [7] is not 
straightforward, as often the case with differential privacy. Therefore, we illustrate the effect 
of the muting parameter on the guarantees of our gossip protocol by simulating concrete 
source location attacks. We consider two challenging scenarios where the adversary has some 
background knowledge: either 1) prior knowledge that the source belongs to a subset of the 
nodes, or 2) side information indicating that the same source disseminates multiple rumors. 


Prior knowledge on the source. We first consider the case where the adversary is able 
to narrow down the set of suspected nodes. In this case we can design a provably optimal 
attack, as shown by the following theorem (see Appendix |C.3] for the proof). 


> Theorem 10. Jf the adversary has a uniform prior over a subset P of nodes, i.e., p(lo = 
i) = p(lo = j) for alli, j € P and p(Ip = i) = 0 fori ¢ P, and for some output sequence S, 
te is such that Sı, € P and S ¢ P ift < te , then p(Ip = S:,|S) > p(lo = ilS) for all i. 


A. Bellet, R. Guerraoui and H. Hendrikx 15 


Theorem |10| means that under a uniform prior over nodes in P, the attack in which curious 
nodes predict the source to be the first node in P that communicates with them corresponds 
to the Maximum A Posteriori (MAP) estimator. The set P represents the prior knowledge 
of the adversary: he knows for sure that the source belongs to P. 

Figure [3a] shows the precision of this attack as a function of s for varying degrees of 
prior knowledge. We see that, when s is small, the prior knowledge does not improve the 
attack precision significantly, and that the precision remains very close to the probability that 
the source sends its first message to a curious node. This robustness to prior knowledge is 
consistent with the properties of differential privacy (see Section [2.3). On the contrary, when 
s is high (i.e., differential privacy guarantees are weak), the impact of the prior knowledge 
on the precision of the attack is much stronger. 


Multiple dissemination. We investigate another scenario in which differential privacy 
guarantees can also provide robustness, namely when the adversary knows that the same 
source node disseminates multiple rumors. In this setting, analytically deriving an optimal 
attack is very difficult. Instead, we design an attack which leverages the fact that even 
though the source is not always the first node to communicate with curious nodes, with high 
probability it will be among the first to do so. More precisely, the curious nodes record the 
10 first nodes that communicate with them in each instance (results are not very sensitive to 
this choice), and they predict the source to be the node that appears in the largest number 
of instances. In case of a tie, the curious nodes choose the node that first communicated 
with them, with ties broken at random. Figure|3b|shows that the precision of this attack 
increases dramatically with the number of rumors when s is large, reaching almost sure 
detection for 10 rumors. Remarkably, for small values of s, the attack precision increases 
much more gracefully with the number of rumors, as expected from the composition property 
of differential privacy discussed in Section [2.3] Meaningful privacy guarantees can still be 
achieved as long as the source does not spread too many rumors. 


KA Concluding Remarks 


This paper initiates the formal study of privacy in gossip protocols to determine to which 
extent the source of a gossip can be traceable. Essentially: (1) We propose a formal model 
of anonymity in gossip protocols based on an adaptation of differential privacy; (2) We 
establish tight bounds on the privacy of gossip protocols, highlighting their natural privacy 
guarantees; (3) We precisely capture the trade-off between privacy and speed, showing in 
particular that it is possible to design both fast and near-optimally private gossip protocols; 
(4) We experimentally evaluate the speed of our protocols as well as their robustness to 
source location attacks, validating the relevance of our formal differential privacy guarantees 
in scenarios where the adversary has some background knowledge. 

Our work opens several interesting perspectives. In particular, it paves the way to the 
study of differential privacy for gossip protocols in general graphs, which is challenging and 
requires relaxations of differential privacy in order to obtain nontrivial guarantees. We refer 
to Appendix [D] for a discussion of these questions. Another avenue for future research is 
motivated by very recent work showing that hiding the source of a message can amplify 
differential privacy guarantees for the content of the message [39] 40, [41]. Unfortunately, 
classic primitives to hide the source of messages such as mixnets can be difficult and costly 
to deploy. Showing that gossip protocols can naturally amplify differential privacy for the 
message contents would make them very desirable for privacy-preserving distributed AI 
applications, such as those based on federated [23] and decentralized machine learning [42]. 
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lA Model Extensions 


We kept our model relatively simple in the main paper to avoid unnecessary complexity in 
the notations and additional technicalities in the derivation and presentation of our results. 
In this appendix, we briefly discuss various possible extensions. Basically, we make here the 
point that, although these generally lead to technical complications, they do not significantly 
impact privacy guarantees. 


A.1 Pull and Push-Pull Protocols 


Our study focus on the classic push form of gossip protocols. This can be justified by the fact 
that, for regular graphs, synchronous push has asymptotic spreading time guarantees that 
are comparable with the push-pull variant [43]. Besides, the differential privacy guarantees 
of any gossip protocol are limited by the probability that the first node informed by the 
source is a curious node, and we show this bound can be matched with push protocols. 
Nevertheless, extensions of our results to pull and push-pull variants of gossip protocols [3] 
are possible. Forgetting mechanisms similar to the ones in Algorithm [I] can be introduced for 
these protocols, i.e. nodes would have a probability 1 — s to stop disclosing information after 
each time they are pulled (if they do not pull someone with the information in between). 
Although slightly different, the optimal privacy guarantees would remain of the same order 
of magnitude. Yet, we expect pull guarantees to be even worse in the case s = 1 because 
curious nodes could stop suspecting all nodes that they have pulled and that did not have the 
rumor. Besides, the pull protocol for s = 0 would be even slower than its push counterpart. 


A.2 Eavesdropping Adversary 


Since we consider a complete graph, our formalization of the adversary as a fraction f/n 
of curious nodes is closely related to an eavesdropping adversary who would observe each 
communication with probability f/n. Indeed, both models consider that each communication 
has a probability f/n of being disclosed to the adversary. Most of our results are thus easily 
transferable to this alternative setting. The only difference would be that all nodes can be 
suspected in the eavesdropping model, thus introducing a (1 — f/n)~+ factor each time we 
consider the population of non-curious nodes. 


A.3 Information Observed by the Adversary 


We discuss three possible variants of the output observed by the adversary. 


A.3.1 Messages Sent by Curious Nodes 


For simplicity of exposition, we considered that curious nodes only observe messages that 
are sent to them and not the messages that they send. However, including the messages sent 
by curious nodes in their observed output would not impact the bounds on privacy (i.e., the 
guarantees for the algorithms). For the optimal algorithm, we only consider what happens 
during the first round, so including the messages sent by curious nodes does not change the 
result. This in particular implies that the fundamental limits of Theorem [5] remain the same 
(since the adversary observes strictly more information). Similarly, for the parameterized 
algorithm, Theorem [7|is obtained by bounding the probability of a set $. Then, we have 
p($, Sout) < p($) where Sout is the sequence of messages sent by the curious nodes. In 
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general, adding the messages sent by curious nodes to the output sequences has little or no 
impact on our results. 


A.3.2 Message Ordering 


We assumed that the relative order of messages is preserved in the output sequence observed 
by curious nodes. This could be relaxed, as in real-world networks a message sent before 
another may well be received after it. One could for instance introduce a random swapping 
model to take this into account and investigate whether this weaker output leads to an 
improvement in the privacy guarantees. However, we argue that this improvement would be 
quite limited. First of all, it would not affect the privacy guarantees of the optimal protocol: 
since there is a single active node able to send a message at any given time, swapping is not 
possible. Therefore, the lower bound and the matching algorithm would not be affected by 
this change. Since parameterized gossip is almost privacy-optimal for small values of s and 
swapping would only increase privacy, then we argue that the guarantees of parameterized 
gossip would be very similar in this case. Furthermore, even when several nodes are active 
at the same time (e.g., in Algorithm [I] large s), the proofs can be adapted to work with 
counting the messages received instead of the messages sent. In this case, swapping is as 
likely to expose the source (making its messages arrive earlier) than to hide it (delaying the 
messages it sends). 


A.3.3 Global Timing 


In our model, we assume that curious nodes only have access to the relative ordering in 
which they received the messages but they have no information on the global time at which 
it was sent. This is justified in practice by the asynchrony and locality of the exchanges. 
We briefly discuss here how the privacy guarantees are affected if one considers a stronger 
adversary that has access to the number of times the tell_gossip procedure has been called. 
Formally, this adversary observes the set S = {(t,7,7)|(i,7) = (Somni)t,7 € C}. This set can 
be turned into a sequence by ordering it by increasing values of t. Note that this is not a 
realistic adversary as gossip protocols naturally enforce partial observability of the events. 

The following result quantifies the limits of privacy for this stronger adversary, which can 
be compared to the results of Theorem [5] in the main text. We can see that in the regime 
€ = 0 (total variation distance), the limits remain the same. However, achieving 6 < f/n 
and prediction uncertainty is not possible against this stronger adversary. Note also that 
Algorithm 1 with s = 0 remains optimal. 


> Theorem 11. If a gossip protocol satisfies (€,6)-differential privacy and c-prediction 
uncertainty then we have 6 > £ and c = 0 in the strong adversary setting. Furthermore, 
these bounds are tight and matched by Algorithm 1 when its parameter is set to s = 0. 


Proof. The fact that tell_gossip is called at least once and is first called on node 0 still 
holds. Sequence S$) now denotes the fact that node 0 communicates with a curious node at 
time 0. Since the protocol is run on the complete graph, the node selected by tell_gossip 
is chosen uniformly within {0,...,2 — 1}, so a curious node is selected with probability À. 
We thus have po(S®) = £. Besides, node 0 cannot communicate with a curious node at 
time 0 if node 1 starts the rumor so p;(S)) = 0. For prediction uncertainty, using the same 
sequence $( yields Ben = 0 for all i 4 0 and therefore c = 0. 

It remains to show that these bounds are matched by Algorithm[I}with s = 0. The fact that 
the only outputs that have a different probability if node 0 starts (compared to the case when 1 
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starts) are those in which 0 (or 1) communicates with a curious node for its first communication 
is still true with the stronger adversary. Then, we write po(So = 0) = pi(So = 1) £ and 
po(So = 1) = pı (So = 0) = 0. This ensures that po(S(0)) < pı(S®) + £ (similarly for S®), 
and the result follows. < 














A.4 Malicious Behavior 


We also assumed for simplicity that nodes are curious but not malicious, i.e., they follow 
the protocol. This is motivated by a practical scenario where a subset of nodes are simply 
being monitored by a curious entity. If curious nodes can also act maliciously, they have 
three possible ways to affect the protocol (abstracting away the content of the information): 
emitting more, emitting less, or not choosing neighbors uniformly at random. If they emit 
more, they will inform more nodes, which makes it more difficult for them to locate the 
source. If they emit less (potentially not at all), then in the case s < 1, the protocol could 
stop before all nodes are informed. Yet, the privacy bounds are derived from the fact that 
the source forgets the information before communicating to a curious node. Choosing the 
neighbors they send the messages to reduces to the case in which they emit less (for they do 
not send messages to uninformed nodes) but without affecting protocol speed or termination 
(this does not reduce the number of active nodes). Thus, the impact on the observed output 
and therefore on the privacy would be minimal. In the case s = 1, malicious nodes have 
slightly more impact but remain quite small: this case only makes the set of informed nodes 
grow slightly slower. 


A.5 Termination Criterion 


For simplicity, in all our gossip protocols we used a global termination criterion (the protocol 
terminates when all nodes are informed). Termination without using global coordination is a 
problem in its own right that has been extensively studied (see for instance [3]). Although 
some termination criteria could have a great impact on privacy, we argue that termination 
can be handled late in the execution so as to reveal very little about the beginning, hence 
avoiding any significant impact on privacy. For instance, it is possible to design a variant of 
Algorithm [I] in which nodes only flip a coin with probability s for a fixed number of times, 
and then stop emitting completely. This fixed number would have to depend on s, but then 
if it is large enough, it would guarantee both termination and privacy. Indeed, nodes would 
not communicate with curious nodes each time they are activated with high probability so 
this counter would actually provide very little information to the curious nodes. Determining 
how large this number of iterations should be, and the exact impact on privacy (which we 
argue is very small), is beyond the scope of this paper. 


'B_ Delayed Start Gossip 


Consider the protocol described in Remark [6] which we call delayed start gossip: 
1. The source calls tell_gossip once to forward to an arbitrary node, say node j. 
2. Node j then starts a standard push protocol (Algorithm [I] with 8 = 1). 


This simple protocol is clearly optimal from the point of view of differential privacy in 
the regime € = 0 (total variation distance). Indeed, if the first communication does not hit a 
curious node then the probability of a given output when two different nodes start the gossip 
is the same. It is also fast since it runs the standard gossip after the first round. 
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Yet, this naive protocol has a major flaw. Indeed, when the first communication hits 
a curious node, the adversary can monitor whether the sender communicates with curious 
nodes again in the next rounds. If it does not, they can guess that the node is the source, 
and they will in fact make a correct guess with probability arbitrarily close to 1 for large 
enough graphs. On the other hand, when the sender communicates again with a curious 
node shortly after, they can be very confident that this node is not the source. Hence, it is 
possible to design a very simple attack with a very high precision (almost always right) and 
almost optimal recall (identifying the source with certainty every time the information is 
actually released, i.e. with probability É), 

Making sure that the adversary is uncertain about its prediction is therefore a desirable 
property. This is captured by our notion of prediction uncertainty. The following proposition 
formalizes the above claims. 


> Proposition 12. We call cas the prediction uncertainty constant of the delayed start protocol 
and we assume the ratio of curious nodes f/n to be constant. Then cas — 0 when n — oo. 


More generally, it is in principle possible to prove similar results for any protocol in which 
the source node does not behave like other nodes. Indeed, if the special behaviour can be 
detected, then the adversary can know for sure the source of the rumor. This motivates the 
need for more involved protocols such as those covered by Algorithm [I] 


Proof of Proposition The proof reuses some elements of the proof of Theorem [4] We 
consider the sequence S® such that node 0 is the first node to communicate with a curious 
node (So = 0) and then r other nodes communicate with curious nodes before 0 does (S; 4 0 
for i € {1,...,r}). We denote by to the time at which node 0 gets the message and becomes 
active again (we refer here to the global order, although of course the curious nodes do not 
have access to it). Then, with the usual notations we have: 


Po (5) = po(So = 0) po (5080 = 0) 


> no (fas: # 01S = 0) 
> age, >r) 
n 
> Lpolne(r) < k*)po(to > rinc(r) < k*). 


Then, we recall from the proof of Theorem [4] that 


po(ne(r) < k) =p (Binom Î) > r) 
=p (Gino 1- f) <k- r) 


= 1- p(Binom(k,1- É) >k =r), 
n 


so if we set k = a and use tail bounds on the binomial law (Theorem 1 of [44]) then 


there exists a constant H (only depending on f/n) such that po(ne(r) < Fr) >1-e-"À, 
Therefore, we have: 


po (80) > Laem) (1-5) zan: (a 
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The last line comes from calculations done in the proof of T heorem |A] We now study pı (30), 
Since node 1 started the protocol then it means that no other node (and in particular 0) will 
stop emitting the message. Therefore, if node 0 is the first to communicate with a curious 
node then it will remain active for the whole duration of the protocol. Consider that the 
first disclosure happens after T; communications. We can write: 


pr (59) < pi(So = 0)p1 (Mai Si # 0/50 = 0, Tp < ty) + pa(Zy > ty). 


Since the fraction of curious nodes is constant, we can choose ts independently of n or r 
such that p(T; > tf) < eats < CE) if ty = 7 log (4-2) in order to control the second 
term. Then, 


tr+r fl ft tetra 
pa (NaS; À 0|So = 0, Ty < ty) < Il (1-47) <e Et, 


t=ty 





A series-integral comparison yields that if r = log?(n) then exp ( f E > 1) < 5 forn 


large enough. Finally, we use the fact that p1(S0 = 0) < af to write that pı (s®) < IT 
Finally, we observe that Cy (log? n, n) > Î when n — oo where C1 is defined in Equation|4] 
In particular, C4 (log? n,n) > r for n large enough, so we have 


p(o # 0150) _ pi(S®) 
(lo = 0180) a 


Since € can be picked arbitrarily small and 7 is assumed to be constant then the previous 
ratio can be made arbitrary small. < 


< 7e (5) 


“C* Detailed Proofs 


C.1 Privacy Guarantees 


Proof of Theorem [4] Intuitively, the proof relies on the fact that the event {ta(0) < ti(1)} 
(node 0 communicates with a curious node before node 1 gets the message) becomes more 
and more likely as n grows, hence preventing any meaningful differential privacy guarantee 
when n is large enough. To formalize this, we study sO = {S, S; = 0 for some t < r}, the 
set of output sequences such that the rank of node 0 in the sequence is less than r. For a 
specific sequence to not be in sO, there must have been at least r communications (because 
r nodes must have communicated with curious nodes), and none of them involved 0 and 
a curious node. Therefore, if we note nc(r) the number of communications that actually 
happened before the output sequence reached size r, we have ne(r) > r. Then, denoting by 
C(t) the node that communicated with a curious node at time t (with C(t) = —1 when the 
communication did not involve a curious node): 


po( S$) = 1-p (nepe) 4 0) = 1- TT p(C@ #0) 21-T- (1-42), 


where the last step comes from the fact that the probability of node 0 to be selected at time t 
is Ta > + because at most one node is informed at each step and the active node is selected 
uniformly among informed nodes. We use the fact that log(1 + x) < x for any x > —1 on 


T= ir to get: 
tao (1- rh) =ep(Eiole(i-f4))<em(-15 ch). (6) 
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Therefore, po( So”) goes to 1 as r goes to infinity. We emphasize that we do not need to fix 
any network size for this result to hold since the ratio f/n is assumed to be constant. 
Then, for a given r and for any k > 0, p(n-(r) < k) is equal to p(Binom(k, £) > r) where 
Binom(k, 2) is the binomial law of parameters k and À. This is because it is the probability 
of having exactly r successes with the sum of less than k Bernoullis of parameter À, which 
is equal to the probability of having more than r successes with the sum of k Bernoullis of 
the same parameters. Therefore, p(n-(r) < k) is independent of n and we can choose k* 


independently of n such that p(ne(r) > k*) < 4. Then, we write that 
pr (Si) = p (8P, nelr) < k*) +m(S0,ner) > k*) < pi(S ner) < k*) +1/n. 


This implies p,(S0°|n.(r) < k*) < p1(0 € I,|ne(r) < k*) < 1—p, (0 € I,|n-(r) < k*). We 
know that only r communications have reached curious nodes but the others have reached a 
random node in the graph, and there is at most k* of them, so finally: 





m(S) <1- (1 JE 


n n 


We immediately see that pri( So) goes to 0 as n grows because k* is independent of n, 
and we have shown above that po( So) goes to 1 as n grows. Since we must have that 
po(S0) < ep (S50) +ô, we must have 6 = 1 if we want 6 and € to be independent of n. < 


Proof of Theorem For any set of sequences S C S such that po(S) > 0: 


p(lo #018) _ pi(S) pi(Ai = {0})pi(S|A1 = {0}) 
po = 03) > po(S) 








i CU{0} Po(S) ~ id CU{0} 


where A, is the set of active nodes at round 1. Because the state of the system (active nodes) 
is the same in both cases we can write that p;(S|Aı = {0}) = po(S). Besides, p;(A1 = {0}) 
corresponds to the probability that node 7 sends a message to node 0 and then stops emitting. 


Therefore: Pr > (1 m) (1— 5) > 0. < 





C.2 Spreading time 


Proof of Theorem D] The idea of this proof is to rely on the “determinism” of gossip process, 
similarly to [86]. This means that the gossip process very closely follows its mean dynamics. 
In our case, there is an added difficulty in the fact that extra randomness is introduced by 
the deactivation of the nodes. Yet, we precisely quantify the impact of this phenomenon on 
the results. We start by showing that if more than k(s) nodes are informed at a given time, 
then with very high probability the number of informed nodes never drops below this fraction 
once it is reached. Therefore, a number of messages proportional to the size of the graph is 
sent at each round. The condition on s for this to happen is written in Equation (11). More 
formally, we fix s € (0,1] and denote by A; the number of nodes that are active at round t, 
which is such that A; = œm. Then, we note 


fia—1-p,(a)(l — as), (7) 














where p,(a) = (1 — 4)°". Note that f(a) = +E[A;41|Az = an]. To see this, we count the 
number of active nodes at time t+ 1. In total, A; = an messages are sent at the beginning 
of the round. Therefore, for each node, the probability of having received a message at the 


end of the round is exactly 1 — p,(a) since each message has a 1/n probability to be sent to 
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this specific node. In the end, n (1 — p,(æ)) nodes get the message in expectation. The rest 
of the active nodes at time t + 1 is made of the nodes that were active, did not receive the 
message and did not deactivate, which represents a portion nap,(a)s of the nodes. Then, 
one can see that the function f is simply the sum of these 2 terms. We show by using that 
(1—a)¥ < e7” <1- gryt SE that for a < as = 


s $ 
I 2s’ we have: 


fa) 2 (1+5)0. (8) 


Then, we follow the same steps as in Lemma 15 in [86]. We call A; the number of active 
nodes at round t, and A; the number of active nodes at round t after m messages have 
been sent (so during the round). Then, we can define X; = At i+1 — Ai. At,i+1 only depends 
on Az; and so does X;: 





1 with proba s(1 — Fais 
Xi = { —1 with proba (1-— s) a 


0 otherwise 














Then, we define the martingale Z; = pes Xil Xi; ,Xi, A]. This allows us to write 
Aisi — nf(a) = Zo — Za,. If we call Skt = > à Xi then for any d € {—1,0,1}: 














[91 ,2|X1,, Xi, Xi+ı = 1, 4] 
= Si tl Xa, idi XG, Xiti = d, Ai] 
> ESA, Xi, Aie = —1, A], 






































because the distribution of X; only depends on A;,;. Therefore, |Zj41—Z;| < (1+E[Sj41,4|As+ 
1]) — (E[S;41,2/Az — 1] — 1)] < 2. Azuma’s inequality [45] then gives: 














Ak)? 


A 
p (as = nf(—) < —)A;|A; = x) < e 8 , (9) 





We also have that p(Ary1 < k| At > k) < p(Ay11 < k|Ay = k). Then, for any a < as, 
Equation [8] yields that for all À: 


p (Ana < Ar (1+5-A) 14e) < er At, (10) 


We can then bound this expression by using Equation [9] with A = 5, leading to 


pd < kl Ay > k) < e 32* ifa L Ag 


Denoting by Nx; the number of messages sent between rounds k and j, we can decompose 
over Ca! logn rounds so that if m is such that there are at least a active nodes at round 
m then: 


32 


GN aston togn 2 Cn log n) 2 (1 -e aye logn, 





because it is equal to the probability that the fraction of active nodes never goes below a for 
Ca! log n rounds. Therefore, if 
2. 32 Clogn 


>] 
S = on 8 œlog(1 — ô) 





, then p(Nm,m+Ca-t een = Cnlogn) > 1-— ô. (11) 


Equation [11] gives a lower bound on the value of a. Note that for a fixed a, this lower 
bound goes to 0 as n grows so in particular, Equation [I1]is satisfied for a = ag if n is large 
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enough. It now remains to show that such a fraction a of active nodes can be reached in 
logarithmic time. Usual gossip analysis takes advantage of the exponential growth of the 
informed nodes during early rounds for which no collision occur. We have to adapt the 
analysis to the fact that nodes may stop communicating with some probability and split the 
analysis into two phases. 

In the rest of the proof, we prove that a constant fraction of the nodes (independent of n) 
can be reached with a logarithmic number of rounds. We first analyze how long it takes to 
go from O(log n) to On ) active nodes and then from 1 to O(logn). Equation [8] along with 
Equation [9] with À = $ give that as long as A;,(1 + 4)’ < asn then 


p (Aani > Aall HEHA = Aalt HEN) 31-0 


4 





for any t > to such that Ar (1 + ey < nas . Therefore, if we do this for allt < ta, = is se IFs 2 
ta 


A+ a2 
rounds (so for a logarithmic number of rounds) then p (Aia, Ho > nal A) > (1 ee | 
because in this case, A; > A, for t > to. Therefore, if 








128 zi 
Ar > —— log (1 (1 — ô) À then p(At,,4t) > nas| Ar) > 1— 6. (12) 
Using the fact that $ — x) < e7® < 1- ry + along with the fact that ô < 1 < ta, to 
simplify Equation [12] we show that if Ar, ét 
128. /2ta, 
At, > “a 108 ( 5 ) ; (13) 


then it also satisfies Equation [12] 

It only remains to prove that such an Az, can be reached with to logarithmic in n. For this, 
use again Azuma inequality but on « consecutive rounds this time. Therefore, Equation [10] 
becomes, assuming that at least A; messages are sent at each round: 


p(n < Ae (1+ z À) |Ar) < e Ar, (14) 


We apply this inequality for k = 2C log(n)/s and À = C'log(n)/2 (which is valid because at 
least Ap = 1 node is active at each round), which yields: 


p (4. 214 TE) 4) < ) < eo EMC. (15) 


In particular, for a fixed values of C, s, and 6, then p (A. > Crest) A, ) <1—0 for n large 


enough. Finally, ta, is logarithmic in n so similarly, Equation is satisfied for tọ = kif n 
is large enough. 
We conclude the proof by noting that 


p (No,to+ta. 4 Cat logn > Cnlogn) 


12 2ta 12 2ta 
> p (Ay > los ( =) )p (a. sto 2 maslAe, > log ( a) 


X p (Nia, +to,ta,+totCa-!logn > Cnlogn|Attt,, > nas) 
> (1-8)? > 1= 30. 








Finally, we have that to < 2C log(n)/s, ta, < log(n)/s and 1/a; < 3/s so in the end, 
to + ta, + Cat logn < 6C log(n)/s. Without loss of generality, 6 can also be replaced by 
6/3. < 
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> Remark 13 (Extension to the Asynchronous Version). The first part of the proof directly 
extends to the asynchronous algorithm by simply considering slices of time during which a 
set of an nodes send an messages, which essentially means constant time. Then, we consider 
a logarithmic number of slices. The phase from 1 to O(log) active nodes requires sending a 
logarithmic number of messages and can thus be done in logarithmic time. Finally, phase 2 
(going from O(log n) to O(n) active nodes) consists in evaluating a logarithmic number of 
rounds during which a logarithmic number of nodes are active. Again, the only important 
thing is the number of messages sent (and not which node sent them) so using constant 
time intervals ensures that enough messages are sent between each pseudo-rounds with 
high probability. To summarize, it is possible to prove a statement very similar to that of 
T heorem [9] in the asynchronous setting, where the notion of rounds is replaced by constant 
time intervals. We omit the exact details of this alternative formulation. 


C.3 Maximum Likelihood Estimation 


Proof of Theorem We recall that S is the output sequence observed by curious nodes, 
so that So, is the first node that communicates with a curious node. The source is noted Ip, 
as it is the first node informed of the rumor. The set P is such that p(lo = i) = 0 if i ¢ P. 
Recall that te is such that S;, € P and S; ¢ P for 0 < t < te. By a slight abuse of notation, 
note A; the set of active nodes at the time where S; is disclosed (time of t-th communication 
with a curious node), so S; € A; for all t. 

We know that for all i € P then p((S:)r<z,|At., Jo = à) = p((St)t<te| At.) since S; & P 
for t < te. Similarly, p((S:)i>¢.|Ae., Lo = à, (St)t<te) = P((St)t>te| At.) since the output after 
some time only depends on the active nodes at that time. Therefore, p(S|A4,, Io = i) = 
p(S|A;.) for all i € P, which critically relies on the fact that te is the time of first disclosure 
of a node in P (the first inequality would not hold otherwise). We note [n] = {1,...,n}. We 
then write for i € P: 


Plo = is) = XO p(Ar, = AlS)p(Io = i] At, = A, S) 


ACÎn] 
= XO p(4, = AlS)p(Io = il At, = À) 
AC[n] 
Ip =12 
= SpA = Alp P=) yA, = Al =). 
AC[n]:8;,€4 Plr. = A) 








Let j € PN A. Ifi € Ay, then p(Ay, = Allo = i) = p(At, = Allo = j). Otherwise, 
let us denote E;;(A) = re a\{j}1{k active at time te} Neg augi} {k inactive at time te}. This 
event represents the realization of A+, for all nodes different from i and j. We then write: 


p(At, = Allo = i) = p(Nrea{k € A4} Mega {k € At. tlo = à) 


= p(Ei;(A)Ho = i)p(j € Ar. i € At. [Lo = i, Fij(A)) 

= p(Ei;(A)Ho = j)p(j € At., E At. [Lo = i, Ei5(A)) 

< p(Ei;(A)Mo = j)p(j € Ar., i E At. [Lo = j, Ei;(A))) 
p(Ar. = Allo = j). 








The inequality comes from the fact that it is more likely that j is active and à is 
inactive if j is the source than if i is (ie., if it is already the case at the beginning). 
This means that p(A:, = Allo = i) < p(Ai, = Allo = j) for all i € [n] and j € Ar. 
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Since the summation is over all A such that So € A (by definition of S+, and A44,), and 
p( At, = Allo = à) < p(Az, = Allo = St), we have for all considered A: 


= ilS) = _ ag, Pot) A -ALi 
plo = |S) = , À P(At. = A) A. = grt = Alb = i) 
C[n]:St. EA 
Lo = S 
< D plAt. = Ajs) 20 St) 504, = Allo = St.) 
AC[n]:51.€A p(At, = ) 
= p(lo = Si.|S) 


This means that S;, is more likely to be the source than any other suspected node when 
the adversary observes output S. Note that this requires uniform prior over nodes that can 
be suspected since we used the fact that p(lo = i) = p(lo = St.) for all i € P. For i ¢ P, 
plio = i]S) = 0 < p(Ip = Si. lS). < 


ID Challenges of Private Gossip for General Graphs 


A natural extension of this work is to consider general graphs. We discuss in this section 
several aspects related to the natural privacy of gossip protocols in arbitrary graphs. In 
particular, we highlight the fact that problem-specific modeling choices are needed to go 
beyond the complete graph, and that even defining a notion of privacy that is suitable for all 
graphs is very challenging. 


D.1 Average-Case versus Worst-Case Privacy 


Unlike the case of complete graphs, the location of curious nodes critically impacts the 
privacy guarantees in arbitrary graphs. A naive way to deal with this issue is to randomize 
the location of curious nodes a posteriori. Let us denote by a j the set containing all subsets 
of nodes of size f of the graph that do not contain à and j. For fixed nodes 7 and j, the 
set of curious nodes C is drawn from U (£f j)» the uniform distribution over LÍ j- For some 
parameters €, ô > 0, privacy can be defined as follows: Vi, j € {0,....n—1}, YSE S 














tesut ,)[Pi(S,€) — e‘p;(s,C) < 8. 


Note that pi(S,C) = 0 if the output sequence S is not compatible with the set of curious 
nodes C, i.e. if (k,l) € S and k,l ¢ C. To pick the curious nodes, it is possible to either pick 
a set of f curious nodes at once or to pick each node (except for i and j) with probability 
f/n. This randomized definition allows to prove a bound similar to that of T heorem [5] for 
arbitrary graphs. Indeed, the first node that receives the rumor has probability Î of being 
a curious node. However, such average-case notions of privacy are highly undesirable: in 
this case, no protection is provided against a (much more realistic) adversary that controls a 
fraction of nodes fixed in advance. 

The worst-case approach consists in bounding the maximum difference instead of the 
expectation. This is the approach taken in our work for the complete graph (the max 
operator is implicit because the location of curious nodes does not matter in a complete 
graph). In the case of general graphs, the corresponding privacy definition is given by: 
Vi, j € {0,...,.n-1}, VS ES, 


max [pi(S,C) =e e‘p;(5,C)] = ô. 
ceci, 
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We immediately observe that with this definition, it is impossible to have 6 < 1 as soon 
as there is a node in the graph with less than f neighbors. This modeling choice is quite 
unrealistic as well because having a node surrounded by curious nodes means that the 
adversary actually believes this specific node has a strong probability of being the source and 
therefore put more sensors around it. A possible alternative would be to place curious nodes 
so as to bound the maximum privacy for any pair of nodes, and then evaluate the minimum 
privacy in this setting. This definition would mean that the adversary wants to be able to 
distinguish any pair of nodes as best as possible. 

We see that choosing the locations of the curious nodes in an arbitrary graph is a complex 
problem that is heavily dependent on the topology of the graph and on the prior of the 
adversary on the locations of the curious nodes. Indeed, the adversary may simply want to 
isolate a sufficiently small group of nodes that have a high probability of being the source. 


D.2 Relaxing the Differential Privacy Definition 


Differential privacy is a very strong notion that enforces indistinguishability between all pairs 
of nodes, in order to be robust to any prior information about who might be the source. In 
particular, an adversary should not be able to precisely identify the source even if it knows 
that only two nodes in the graph can be the source. Although it was possible to obtain 
meaningful privacy guarantees of this kind for the complete graph, this appears to be too 
strong of a requirement for some graph topology and location of curious nodes. Consider 
for instance the extreme case of a line graph. It is clear that any non-trivial adversary can 
always distinguish between two segments of the line. This intuition directly extends to any 
graph which admits a cut with only curious nodes in it. 

A natural idea is to restrict the pairs of nodes that are required to be indistinguishable. 
Several ways of doing this may be considered. For instance, one could require that each 
node is indistinguishable from k other nodes in the graph. Such relaxed definition could be 
obtained using the Pufferfish framework [46], which explicitly provides a notion of secret to 
protect. But how to choose such k nodes based on the topology and how to characterize the 
optimal locations of curious nodes is very challenging. Another direction could be to adapt 
the notions of metric-based differential privacy [47] 48] to design a notion of privacy where 
the required indistinguishability for a given node is a function of its distance to curious nodes 
in the graph, or to require that pairs of nodes become less indistinguishable with distance in 
the graph. Yet, it is not clear how to characterize the influence of the graph topology. 


D.3 Optimality of Algorithm [I| with s=0 


We have seen in this section that the privacy guarantees for arbitrary graphs heavily rely on 
the particular privacy notion and that some recent privacy frameworks may provide tools 
to relax the classic differential privacy definition which is generally too strong for arbitrary 
graphs. We conjecture that for some of these relaxed definitions, the optimal algorithm for 
general graphs will be the same as in our case of the complete graph. Indeed, the strength of 
Algorithm [I] with s = 0 is to forget initial conditions quickly. In the complete graph, it does 
so in one step. In an arbitrary graph, the information about the part of the graph the source 
belongs to is still present after some steps, but the source should quickly be completely 
indistinguishable from its direct neighbors. In particular, attacks based on centrality [T7] 
are rather meaningless against this algorithm because spreading only occurs along a random 
walk in the graph. As in the case of the complete graph, Algorithm [I] with s > 0 is then 
likely to enjoy near-optimal privacy guarantees. 


